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Here we report the generation and characterization of 84 mouse ES cell lines with doxycycline- controllable 
transcription factors (TPs) which, together with the previous 53 lines, cover 7-10% of all TPs encoded in the 
mouse genome. Global gene expression profiles of all 137 lines after the induction of TPs for 48 hrs can 
associate each TF with the direction of ES cell differentiation, regulatory pathways, and mouse phenotypes. 
These cell lines and microarray data provide building blocks for a variety of future biomedical research 
applications as a community resource. 

Mammalian genomes encode 1,500-2,000 transcription factors (TFs)\ which cross -regulate one another 
to form the network of TPs. The network controls the transcriptome of cells, thereby defining the 
identity of cells. A powerful approach to deciphering such a complex network is the systematic per- 
turbation of individual TPs followed by global gene expression profiling^. 

Results 

Here we report the generation of mouse embryonic stem (ES) lines, each of which has been engineered by 
integrating an expression cassette of a specific transcription factor (TF) into the ubiquitously expressing 
Rosa26 locus (Fig. la)^. The Rosa26 locus^ drives relatively uniform expression of the exogenous copy (transgene) 
of a TF, which is repressed by doxycycline (Dox) and can be induced in Dox- cell culture conditions (Fig. Ib)^. 
Combined with the 53 ES lines reported previously^, we present a total 137 ES cell lines. The majority of the 
manipulated genes were TFs, which were selected from a set of high-priority genes involved in critical functions in 
mouse ES cells and their differentiation^. To ensure the quality of these ES cell lines, we implemented vigorous QC 
steps that have been described previously in detaiP. As a part of the characterization of these ES cell lines, we 
carried out global gene expression profiling by DNA microarrays 48 hours after TF induction (Fig. Ic; GEO 
accession number, GSE31381). The induction of a TF was confirmed by qRT-PCR (Fig. Id, Supplementary Table 
1 for primer pairs). The effect of TF induction on the transcriptome of mouse ES cells was highly variable (Fig. le; 
Supplementary Table 2). On a scale of the number of genes significantly changed in expression (FDR < 0.05, fold 
change ^1.5), the top 10% of studied TFs changed 4676 genes on average (e.g., Dmrtl), whereas the bottom 50% 
of TFs caused significant changes in expression in only 54.5 genes on average (e.g., Mbd3) (Fig. Ic, d). 

To further characterize the transcriptome alterations caused by each TF, we compared our microarray data 
with 3 public databases: the gene expression profiles of many mouse organs/tissues at The Genomics Institute of 
the Novartis Research Foundation (GNF) (ver. 2 & 3)^'^, the Genetic Association Database (GAD) on gene sets 
associated with mouse phenotypes^, and the MSigDB database (ver. 3) of gene sets associated with signaling 
pathways and cellular functions^. Because the GNF database is quantitative and the two other databases are 
qualitative, we used different methods to quantif)^ association: correlation of median -subtracted log- transformed 
gene expression values for the GNF database, and Parametric Analysis of Gene Expression (PAGE)^° for the GAD 
and msigdb databases (see Supplementary Methods). 

A comparison of our microarray data with the GNF database showed that the induction of a TF in ES cells often 
initiates the differentiation of ES cells into specific cell types as soon as 48 hr later, when cells do not yet exhibit 
any overt phenotypes (Fig. 2 for GNF ver. 3; Supplementary Fig. 1 for GNF ver. 2). For example, the transcriptome 
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Figure 1 | Induction of transcription factors (TFs) in ES cells: (a) plasmid structure that includes loxP recombination sites, puromycin resistance gene, 
open reading frame (ORF) of a TF with hCMV promoter followed by His6-FLAG tag; (b) schematic diagram showing the expression of transgenic TF 
induced in Dox- conditions; (c) examples of scatterplots of gene expression in Dox- versus Dox+ condition. Green and red dots indicate genes that are 
differentially expressed with statistical significance (FDR<0.05, change > 1.5 fold); (d) Increase of transcription factor expression after the induction of a 
transgene, as measured by qPCR (Dox- vs. Dox+); results from two biological replicates (3 technical replicates each); error bars (S.E.M.; ANOVA); and 
dashed line = 2 fold change; (e) a list of TFs and the number of genes up- or down-regulated by the induction of the TF (FDR<0.05, change >1.5 fold) 
(Supplementary Table S2). 



of ES cells shifted toward a neural profile after the induction of Sox9, 
Foxgl, Klf3, or PouSfl; toward endoderm after the induction of 
Hnf4a, Gata2, Gata3, or Esxl; and toward skeletal muscle and heart 
after the induction of Myodl or Mef2c. Similarly, the transcriptome 
of ES cells shifted toward hematopoietic cell lineages after the induc- 
tion of Sfpil, Elfl, or Irf2; and toward T-cells and thymocytes after 
the induction of Elf5 or Tgifl. Interestingly, TFs associated positively 
with transcriptome changes toward specific lineages showed a nega- 
tive association with those toward different cell lineages (Fig. 2). For 
example, TFs associated with transcriptome changes toward neural 
tissues were negatively associated with those toward hematopoietic 
lineages (e.g., Sox9 and Foxgl in Fig. 2), and vice versa (e.g., Irf2, Elfl, 



Sfpil in Fig. 2). These data suggest that TF networks are organized to 
cross-regulate as if different tissue lineages are mutually exclusive. 

A comparison of our microarray data with the GAD database 
identified associations of TF's with mouse phenotypes (Fig. 3). 
Many newly identified associations are consistent with published 
data. For example, Hoxa2 was associated with the pancreatic alpha 
and beta cells^^; Foxcl, with hair folhcle/shaft^^'^^; and Soxll with 
skeletal defects^^. A comparison of our microarray data with the 
msigdb database identified the association of each TF with specific 
cells and pathways (Fig. 4). For example, Smad6 was associated with 
keratinocytes^^; Myodl, with alveolar rhabdomyosarcoma^^; and 
Hnf4a, with lipoproteins^^. 
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Figure 2 | Correlation of gene expression response to the induction of TFs with tissue-specific gene expression from the GNF ver. 3 database^. 



Discussion 

The collection of mouse ES cell lines reported here are freely available 
to the research community (http://esbank.nia.nih.gov/index.html). 
The analysis presented here can help researchers select ES cell lines 
suitable for their own research programs. For example, these TF- 
manipulable ES cell lines can be used to study the complex mechan- 
isms of ES cell differentiation toward specific lineages. These ES cell 
lines are also adaptable to a variety of experiments and analyses, as 
shown in our previous report^. For example, each TF is C-terminally 
tagged with His6-FLAG, which simplifies studies of TF localization, 
protein-protein interactions, and protein-DNA interactions^. Further 



mining of the microarray results reported here as well as additional 
experiments with provided ES cell lines and their derivatives will 
yield more insight into gene regulatory networks. Carrying out sim- 
ilar experiments for more regulatory proteins (ideally for all TFs and 
additional signaling proteins) should give increasingly complete 
information to comprehend gene regulation in mammalian cells 
and organs. 

Methods 

Derivation of transgenic ES cell lines. ES cell lines with inducible TF transgenes were 
derived from MCI mouse ES cells (129S6/SvEvTac), passage 17. Cells were cultured 
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Figure 3 | Enrichment of gene sets associated with mouse phenotypes from GAD database^ among genes that were upregulated (positive) or 
downregulated (negative) after the induction of various TFs. 



in DMEM with 15% FBS and LIF on feeder cells. Cells were electroporated with a 
linearized pMWROSATcH vector and selected by hygromycin B. Knock-in for 
ROSA-TET locus was confirmed by southern blotting. For exchange vectors, PGR 
amphfied ORFs were subcloned into pZhcSfi that was modified to express a His6- 
FLAG tagged protein and puromycin resistance gene. ES cells were co-transfected 
with a sequence verified exchange vector and pGAGGS-Gre and selected by 
puromycin in the presence of doxycycline (Dox). Isolated clones were tested for 



Venus expression, hygromycin B susceptibility, transgene RNA expression, 
genotyping for Gre mediated integration, and mycoplasma contamination. 

Gene expression analysis of cells with induced TFs. ES cells (passage 25) were 
cultured in the standard LIF+ medium with Dox+ on a gelatin-coated dish 
throughout the experiments. Gells from each cell line were split into 6 wells and the 
media was changed 24 hr after cell plating: 3 wells with Dox+ medium, and 3 wells 
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Figure 4 | Enrichment of gene sets associated with various functions and signaling pathways from msigdb ver. 3 database^ among genes that were 
upregulated (positive) or downregulated (negative) after the induction of various TFs. 
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with Dox- medium to induce transgenic TFs. Dox was removed via washing 3 times 
with PBS at 3 hour intervals. Total RNA was isolated by TRIzol (Invitrogen) after 
48 hr, and two replications were used for real time qPCR (see primers in 
Supplementary Table SI) and for microarray hybridization. RNA samples were 
labeled with total RNA by the Low RNA Input Fluorescent Linear Amplification Kit 
(Agilent). For most TFs, we hybridized Cy3-CTP labeled sample from Dox- medium 
together with a Cy5-CTP labeled sample from Dox+ medium. But for 7 TFs we 
labeled samples from Dox- and Dox+ with Cy3, and hybridized them independently 
with a Cy5 -labeled reference target, which is a mixture of Stratagene Universal Mouse 
Reference RNA and MCI cells RNA (this method requires a double number of 
arrays). Analysis showed that both methods produce results of comparable quality. 
Targets were hybridized to the NIA Mouse 44K Microarray v3.0 (Agilent, design ID 
015087)^*^. Slides were scanned with Agilent DNA Microarray Scanner. All DNA 
Microarray data are available in Supplementary Table S2, at GEO/NCBP^ (http:// 
www.ncbi.nlm.nih.gov/geo; accession number GSE31381), and at NIA Array 
Analysis software^" (http://lgsun.grc.nia.nih.gov/ANOVA). 

Normalization of microarray data and detection of outliers. Two methods of array 
hybridizations were used in this study: (1) RNA extracted from cells with induced 
transcription factors (TFs) (cultured in Dox- conditions) and from controlled cells 
(cultured in Dox+ conditions) were Cy3 labeled and all hybridized on separate arrays 
together with reference RNA labeled with Cy5; and (2) RNA extracted from cells with 
induced TFs (Dox-) were labeled with Cy3 and hybridized together with RNA from 
control cells (Dox+) which were labeled with Cy5. The second method does not use 
reference RNA. Data processing depended on the method of hybridization. Potential 
Cy3/Cy5 bias in microarrays with the hybridization of Dox- vs. Dox+ samples was 
removed by normalization to the median logratio of gene expression change in all TF- 
manipulation experiments. The details of the method are available in Supplementary 
Information. 

Statistical analysis of microarray data. For statistical analysis we used NIA Array 
Analysis, which estimates the False Discovery Rate (FDR) to account for multiple 
hypothesis testing^". Response of genes to the knockdown of TFs was measured as a 
logratio (i.e., difference between means of log-transformed intensities) between 
manipulated (Dox-) and control (Dox+) cells. We considered gene expression 
change as significant if logratio was significantly different from zero (FDR < 0.05) 
and the change of expression was >1.5 fold. 

Correlation with tissue-specific gene expression. Association of gene expression 
changes induced by TF manipulation with tissue- specific gene expression was 
evaluated based on the correlation between our microarray results with the GNF 
database^. Correlation was estimated between gene expression responses to TF 
manipulation (logratio of Dox- vs. Dox+) and median- centered log-transformed 
gene expression in various tissues from GNF database (ver. 2 and 3). Because the 
importance of genes in ES cells and adult tissues may be different and different 
platforms of microarrays used in these studies are not 100% compatible, we applied 
correlation analysis to a subset of genes that are highly expressed and dynamic in both 
data sets. We selected 10,000 genes in each database with the highest score equal to the 
product of average log-expression and standard deviation of expression (after 
induction of various TFs or in different tissues), and then took the intersecting 
portion of 5,595 genes for GNF ver. 3 (5,295 genes for ver. 2). Then, correlation values 
and corresponding z-values were estimated based on this subset of genes. The matrix 
was sorted using hierarchical clustering, TMEV, ver 3.V\ 

Analysis of gene set enrichment. Enrichment of target genes in subsets of genes that 
are up regulated or/and downregulated following the manipulation of the TF is 
quantified using a modified Parametric Analysis of Gene Enrichment (PAGE)^°. 
PAGE is based on the comparison of the average expression change in a specific 
subset of genes, xset, with the average expression change in all genes, xall: 

z= (xset-xall) * sqrt(nset)/SDall (1) 

where nset is the size of the gene set and SDall is standard deviation of expression 
change among all genes. We modified this method by applying equation (1) to the 
subset of N top upregulated and another subset of N top downregulated genes rather 
than to all genes combined, which allowed us to detect the enrichment of the same 
gene set among both upregulated and downregulated genes. The value of N = 5000 
was selected experimentally because it appeared that the enrichment of genes with TF 
binding sites is always limited to the top 5000 upregulated or downregulated genes. 
The probability distribution of expression change within subsets of N upregulated 
and downregulated genes is not normal; however, because we compare averages for 
large sets of genes (usually, nset is >50), the probability distribution of these averages 
is close to normal based on the central limit theorem^^. Thus, it is reasonable to use 
equation (1) as an approximation. In the case when both up-regulated and down- 
regulated genes were enriched in a specific functional gene set, we subtracted the 
smaller z-value from both z-values. The matrix of z-values was first sorted using 
hierarchical clustering, TMEV, ver 3.1^\ and then manually converted to a semi- 
diagonal form. 
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